skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Fang, Ethan X"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Augmented Lagrangian (AL) methods have proven remarkably useful in solving optimization problems with complicated constraints. The last decade has seen the development of overall complexity guarantees for inexact AL variants. Yet, a crucial gap persists in addressing nonsmooth convex constraints. To this end, we present a smoothed augmented Lagrangian (AL) framework where nonsmooth terms are progressively smoothed with a smoothing parameter $$\eta_k$$. The resulting AL subproblems are $$\eta_k$$-smooth, allowing for leveraging accelerated schemes. By a careful selection of the inexactness level  (for inexact subproblem resolution), the penalty parameter $$\rho_k$$, and smoothing parameter $$\eta_k$$ at epoch k, we derive rate and complexity guarantees of  $$\tilde{\mathcal{O}}(1/\epsilon^{3/2})$$ and $$\tilde{\mathcal{O}}(1/\epsilon)$$  in convex and strongly convex regimes for computing an -optimal solution, when $$\rho_k$$ increases at a geometric rate, a significant improvement over the best available guarantees for AL schemes for convex programs with nonsmooth constraints. Analogous guarantees are developed for settings with $$\rho_k=\rho$$ as well as $$\eta_k=\eta$$. Preliminary numerics on a fused Lasso problem display promise. 
    more » « less
    Free, publicly-accessible full text available August 1, 2026
  2. Estimating the unknown reward functions driving agents' behavior is a central challenge in inverse games and reinforcement learning. This paper introduces a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization. Given observed player strategies and actions, we aim to reconstruct the underlying reward functions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish reward function identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building on this theoretical foundation, we propose an algorithm to learn reward from observed actions, designed to capture all plausible reward parameters by constructing confidence sets. Our algorithm works in both static and dynamic settings and is adaptable to incorporate other methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample-efficiency of our algorithm. Empirical results demonstrate the framework’s effectiveness in accurately recovering reward functions across various scenarios, offering new insights into decision-making in competitive environments. 
    more » « less
    Free, publicly-accessible full text available August 15, 2026
  3. Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer (DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value function that estimates the advantage functions of the behavior policies that collected the suboptimal trajectories. Then we train a transformer-based policy via a weighted maximum likelihood estimation loss, where the weights are constructed based on the trained value function to steer the suboptimal policies to the optimal ones. We conduct extensive experiments to test the performance of DIT on both bandit and Markov Decision Process problems. Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data. 
    more » « less
    Free, publicly-accessible full text available August 15, 2026
  4. We introduce IRIS, a geometric and heuristic-based scoring system for evaluating mathematical conjectures and theorems expressed as linear inequalities over numerical invariants. The IRIS score reflects multiple dimensions of significance—including sharpness, diversity, difficulty, and novelty—and enables the principled ranking of conjectures by their structural importance. As a tool for fully automated discovery, IRIS supports the generation and prioritization of high-value conjectures. We demonstrate its utility through case studies in convex geometry and graph theory, showing that IRIS can assist in both rediscovery of known results and proposal of novel, nontrivial conjectures. 
    more » « less
    Free, publicly-accessible full text available August 15, 2026
  5. With a manifold growth in the scale and intricacy of systems, the challenges of parametric misspecification become pronounced. These concerns are further exacerbated in compositional settings, which emerge in problems complicated by modeling risk and robustness. In “Data-Driven Compositional Optimization in Misspecified Regimes,” the authors consider the resolution of compositional stochastic optimization problems, plagued by parametric misspecification. In considering settings where such misspecification may be resolved via a parallel learning process, the authors develop schemes that can contend with diverse forms of risk, dynamics, and nonconvexity. They provide asymptotic and rate guarantees for unaccelerated and accelerated schemes for convex, strongly convex, and nonconvex problems in a two-level regime with extensions to the multilevel setting. Surprisingly, the nonasymptotic rate guarantees show no degradation from the rate statements obtained in a correctly specified regime and the schemes achieve optimal (or near-optimal) sample complexities for general T-level strongly convex and nonconvex compositional problems. 
    more » « less